Introduction
What is the bugRzilla Package?
BugRzilla is an R package that helps the user to interact with the Bugzilla through an API.
To learn more, see bugRzilla.
About the BugRzilla Google Summer of COde Project:-
The project bugRzilla is an issue tracker for the R-core members. The goal of the project is to help users to submit issues to R Bugzilla.
About the This Project:-
Explore the issues and bugs on the R Bugzilla to make the submission from bugRzilla better. It might help to identify useful patterns for R core or report the status of the R Bugzilla.
To learn more, see bugzilla_viz.
Setup Database on your local system
Download SQL and MySQL Workbench
To install SQL on Ubuntu one can refer a blog post by digitalocean. To install MySQL workbench on Ubuntu one can refer a blog post by linuxhint
Download R_bugzilla data
- The R_bugzilla data can be downloaded from link.
-
Since the downloaded data is a zip file so make sure you unzip the file before dumping the file which will have an extension
.sql(eg: R-bugs.sql).
Dump downloaded R_bugzilla to MySQL workbench.
Before one import the R_bugzilla SQL file one may need to create the (empty) database from MySQL if it doesn’t exist already and the exported SQL don’t contain CREATE DATABASE (exported with –no-create-db or -n option), before you can import it.
After considering this open your Terminal and run the command: mysqldump -u my_username -p database_name > output_file_path
-
The
-uflag indicates that the MySQLusernamewill follow. -
The
-pflag indicates we should be prompted for thepasswordassociated with the above username.database_nameis of course the exact name of the database to export. -
The
>symbol is a Unix directive forSTDOUT, which allows Unix commands to output the text results of the issued command to another location. In this case, that output location is a file path, specified byoutput_file_path.
bugRzilla Analysis
For the connection to the database, I’m using the dplyr package, it supports to the widely-used open source databases like MySQL.
The libraries used for the analysis:
# loading packages
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dbplyr)##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
library(RMySQL)## Loading required package: DBI
library(DBI)
library(DT)
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ stringr 1.4.0
## ✓ tidyr 1.1.3 ✓ forcats 0.5.1
## ✓ readr 1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dbplyr::ident() masks dplyr::ident()
## x dplyr::lag() masks stats::lag()
## x dbplyr::sql() masks dplyr::sql()
library(ggplot2)
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Connect bugRzilla SQL Database with R
# Connecting R with MySQL
con <- dbConnect(
MySQL(),
dbname='bugRzilla', # change the database name to your database name
username='root', # change the username to your username
password='1204', # update your password
host='localhost',
port=3306)
# Accessing Tables names from the Database
DBI::dbListTables(con)## [1] "attachments" "bugs" "bugs_activity" "bugs_fulltext"
## [5] "bugs_mod" "components" "longdescs"
Data Exploartion of Bugs Table from the Database
bugs_df <- tbl(con, "bugs")## Warning in .local(conn, statement, ...): Decimal MySQL column 24 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 25 imported as
## numeric
# Converting `bugs_df` to `dataframe` and showing the `datatable`
bugs_df <- as.data.frame(bugs_df)## Warning in .local(conn, statement, ...): Decimal MySQL column 24 imported as
## numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 25 imported as
## numeric
datatable(bugs_df, options = list(scrollX = TRUE,
pageLength = 5, lengthMenu = c(5, 10, 50, 100)))## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
From the above table we can conclude that the few of the columns are having wrong datatype like:
- creation_ts
- delta_ts
- estimated_time
- remaining_time
- deadline
- target_milestone
- qa_contact
- status_whiteboard
Cleaning the data
First steps, check the data and prepare it for what we want:
bugs_df <- bugs_df %>%
mutate_at(vars("creation_ts", "delta_ts", "lastdiffed", "deadline"), as.Date)Visualizations
# Line plot for creation_ts, delta_ts, deadline
creation <- bugs_df$creation_ts
delta <- bugs_df$delta_ts
deadline <- bugs_df$deadline
last_diffed <- bugs_df$lastdiffed
bug_id <- bugs_df$bug_id
component_id <- bugs_df$component_id
resolution <- bugs_df$resolution
data <- data.frame(bug_id, creation)
fig <- plot_ly(data, y = ~bug_id, x = ~creation, type = 'scatter', mode = 'markers')
fig## Warning: Ignoring 14 observations
data <- data.frame(bug_id, delta)
fig <- plot_ly(data, y = ~bug_id, x = ~delta, type = 'scatter', mode = 'markers')
fig## Warning: Ignoring 30 observations
data <- data.frame(bug_id, deadline)
fig <- plot_ly(data, y = ~bug_id, x = ~deadline, type = 'scatter', mode = 'markers')
fig## Warning: Ignoring 7008 observations
# bar plot with bug_id and resolution
data <- data.frame(bug_id, resolution)
fig <- plot_ly(data, x = ~resolution, y = ~bug_id, name = 'creation', type = 'bar')
fig